Blueprint of a Cross-Lingual Web Retrieval Collection

نویسندگان

  • Börkur Sigurbjörnsson
  • Jaap Kamps
  • Maarten de Rijke
چکیده

The world wide web is a natural setting for cross-lingual information retrieval; web content is essentially multilingual, and web searchers are often polyglots. Even though English has emerged as the lingua franca of the web, planning for a business trip or holiday usually involves digesting pages in a foreign language. The same holds for searching information about European culture, sports, economy, or politics. This paper discusses the blue-print of the WebCLEF track, a new evaluation activity addressing cross-lingual web retrieval within the Cross-Language Evaluation Forum in 2005.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NTCIR Workshop: an Evaluation of Cross-Lingual Information Retrieval

This paper introduces the first NTCIR Workshop, Aug.30 Sept.1, 1999, which is the first evaluation workshop designed to enhance research in Japanese text retrieval and cross-lingual information retrieval. The test collection used in the Workshop consists of more than 330,000 documents of English and Japanese. Twentythree groups from four countries have conducted IR tasks and submitted the searc...

متن کامل

Finding Translation Examples for Under-Resourced Language Pairs or for Narrow Domains; the Case for Machine Translation

The cyberspace is populated with valuable information sources, expressed in about 1500 different languages and dialects. Yet, for the vast majority of WEB surfers this wealth of information is practically inaccessible or meaningless. Recent advancements in cross-lingual information retrieval, multilingual summarization, cross-lingual question answering and machine translation promise to narrow ...

متن کامل

Generating Cross-lingual Concept Space from Parallel Corpora on the Web

The information available in languages other than English on the World Wide Web is increasing significantly. To cross language boundaries between different languages, dictionaries are the most typical tools. However, the general-purpose dictionary is less sensitive in genre and domain and it is impractical to manually construct tailored bilingual dictionaries or sophisticated multilingual thesa...

متن کامل

IIT at TREC-10

For TREC-10, we participated in the adhoc and manual web tracks and in both the site-finding and cross-lingual tracks. For the adhoc track, we did extensive calibrations and learned that combining similarity measures yields little improvement. This year, we focused on a single highperformance similarity measure. For site finding, we implemented several algorithms that did well on the data provi...

متن کامل

A Voting Mechanism for Named Entity Translation in English – Chinese Question Answering

In this paper, we describe a voting mechanism for accurate named entity (NE) translation in English–Chinese question answering (QA). This mechanism involves translations from three different sources: machine translation, online encyclopaedia, and web documents. The translation with the highest number of votes is selected. We evaluated this approach using test collection, topics and assessment r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JDIM

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2005